Racing Thompson: an Efficient Algorithm for Thompson Sampling with Non-conjugate Priors

نویسندگان

Yichi Zhou

Jun Zhu

Jingwei Zhuo

چکیده

Thompson sampling has impressive empirical performance for many multi-armed bandit problems. But current algorithms for Thompson sampling only work for the case of conjugate priors since these algorithms require to infer the posterior, which is often computationally intractable when the prior is not conjugate. In this paper, we propose a novel algorithm for Thompson sampling which only requires to draw samples from a tractable distribution, so our algorithm is efficient even when the prior is non-conjugate. To do this, we reformulate Thompson sampling as an optimization problem via the Gumbel-Max trick. After that we construct a set of random variables and our goal is to identify the one with highest mean. Finally, we solve it with techniques in best arm identification.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Horvitz-Thompson estimator of population mean under inverse sampling designs

Inverse sampling design is generally considered to be appropriate technique when the population is divided into two subpopulations, one of which contains only few units. In this paper, we derive the Horvitz-Thompson estimator for the population mean under inverse sampling designs, where subpopulation sizes are known. We then introduce an alternative unbiased estimator, corresponding to post-st...

متن کامل

Bayesian Mixture Modeling and Inference based Thompson Sampling in Monte-Carlo Tree Search

Monte-Carlo tree search (MCTS) has been drawing great interest in recent years for planning and learning under uncertainty. One of the key challenges is the trade-off between exploration and exploitation. To address this, we present a novel approach for MCTS using Bayesian mixture modeling and inference based Thompson sampling and apply it to the problem of online planning in MDPs. Our algorith...

متن کامل

Bayesian Mixture Modelling and Inference based Thompson Sampling in Monte-Carlo Tree Search

متن کامل

Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors

In stochastic bandit problems, a Bayesian policy called Thompson sampling (TS) has recently attracted much attention for its excellent empirical performance. However, the theoretical analysis of this policy is difficult and its asymptotic optimality is only proved for one-parameter models. In this paper we discuss the optimality of TS for the model of normal distributions with unknown means and...

متن کامل

MATHEMATICAL ENGINEERING TECHNICAL REPORTS Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1708.04781 شماره

صفحات -

تاریخ انتشار 2017

Racing Thompson: an Efficient Algorithm for Thompson Sampling with Non-conjugate Priors

نویسندگان

چکیده

منابع مشابه

Horvitz-Thompson estimator of population mean under inverse sampling designs

Bayesian Mixture Modeling and Inference based Thompson Sampling in Monte-Carlo Tree Search

Bayesian Mixture Modelling and Inference based Thompson Sampling in Monte-Carlo Tree Search

Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors

MATHEMATICAL ENGINEERING TECHNICAL REPORTS Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors

عنوان ژورنال:

اشتراک گذاری